Recognizing Coordinate Structures for Machine Translation of English Patent Documents

نویسندگان

  • Yoon-Hyung Roh
  • Ki-Young Lee
  • Sung-Kwon Choi
  • Oh-Woog Kwon
  • Young-Gil Kim
چکیده

Patent machine translation is one of main target areas of current practical MT systems. Patent documents have their own peculiar description style. Especially, abstracts or claims in patent documents are characterized by their long and complex syntactic structures, which are often caused by coordination. So, syntactic analysis of patent documents requires special treatment for coordination. This paper describes a method to deal with long sentences in patent documents by recognizing coordinate structures. Coordinate structures are recognized using a similarity table which reflects parallelism between conjuncts. Our method is applied to a practical MT system and improves its quality and efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation

This paper addresses a method for customizing an English-Korean machine translation system from general domain to patent or technical document domain. The customizing method includes the followings: (1) adapting the probabilities of POS tagger trained from general domain to the specific domain, (2) syntactically analyzing long and complex sentences by recognizing coordinate structures, and (3) ...

متن کامل

English-Korean Patent Translation System: FromTo-EK/PAT

This paper addresses a method for customizing an English-Korean machine translation system from general domain to patent domain. The customizing method includes the followings: (1) extracting and constructing large bilingual terminology and the patent-specific translation patterns, (2) adapting the probabilities of POS tagger trained from general domain to the patent domain, (3) syntactically a...

متن کامل

Customizing an English-Korean Machine Translation System for Patent Translation

This paper addresses a method for customizing an English-to-Korean machine translation system from general domain to patent domain. The customizing method consists of following steps: 1) linguistically studying about characteristics of patent documents, 2) extracting unknown words from large patent documents and constructing large bilingual terminology, 3) extracting and constructing the patent...

متن کامل

Toward the Evaluation of Machine Translation Using Patent Information

To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2 000 000 sentence pairs in Japanese and English, which were extracted automatically ...

متن کامل

Building a Statistical Machine Translation System for Translating Patent Documents

This paper describes the work we conducted for building a statistical machine translation (SMT) system for the Chinese-English subtask of the NTCIR-9 patent MT evaluation. Our results show that most of the generic techniques we had developed for improving SMT performance work on patent data as well, and the changes we made to our SMT system training procedure in order to address special charact...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008